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JPACE SUB-SPACE DETERMINATION 

The present invention relates to the determination of sub-spaces of facial 
variations. 

Facial variation divides into a number of functional sub-spaces. An improved 
method of measuring these was designed, within the space defined by an Appearance 
Model. Initial estimates of the sub-spaces (lighting, pose, identity, expression) were 
obtained by Principal Components Analysis on appropriate groups, of faces, An 
iterative algorithm was applied to image codings to maximise the probability of 
coding across these non-orthogonal sub- spaces before obtaining the projection on 
each sub-space and recalculating the spaces. Tests show this procedure enhances 
identity recognition, reduces overall sub-space variance and produces Principal 
Components with greater span and less contamination. 
1 Introduction 

Facial variation can be conceptually divided into a number of "functional' sub- 
spaces - types of variation which reflect useful, facial dimensions [1]. A possible 
selection of these face-spaces is: identity, expression (here including all transient 
plastic deformations of the face), pose and lighting. Other spaces may be extracted; 
the most obvious being age. When designing a practical face-analysis system, one at 
least of these sub-spaces must be isolated and modelled. For example, a security 
application will need to recognise individuals regardless of expression, pose and 
lighting, while a hp-reader will concentrate only on expression. In certain 
circumstances, accurate estimates of all the sub-spaces are needed, for example when 
'transferring' face and bead movements from a video-sequence of ose individual to 
another to produce a synthetic sequence. * . 

Although face-images can be fitted, adequately using an appearance-model 
space which spans the images, it is not possible to linearly separate the different subr 
spaces [7]. Thus we simultaneously apportion image weights between initial 
overlapping estimates of these functional spaces in proportion with the . sub-space 
variance. This divides the faces into a set of non-orthogonal projections, allowing an 
iterative approach to a set of pure, but overlapping, spaces. These are more specific 
than the initial spaces, improving identity recognition. 
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2 Background 

Facial coding requires the approximation of a manifold, or high dimensional, surface, 
on which any face can be said to lie. This allows. accurate coding, recognition and 
reproduction of previously unseen examples. A number of previous studies [2, 3, 4] 
have suggested that using a shape- free coding provides a ready means of doing this, at 
least the when the range of pose-angle is relatively small,- perhaps ± 20° {5]. . Here, 
the correspondence problem between faces is first solved by finding ^ pre-selected set 
of distinctive points (corners of eyes or mouths, for example) which are present in all* 
faces. This is typically performed by hand during training. Those pixels thus defined 
as being part of the face can be warped to. a standard shape by standard grey-level 
interpolation techniques, ensuring that the image- wise and face-wise co-ordinates of a 
given image are equivalent. If a rigid transformation to remove scale, location and 
orientation effects is performed on the point-locations, they can then be treated in the 
same way as the grey-levels, as again identical values for corresponding points on 
different faces will have the same meaning. 

Although these operations will linearise the*- space, allowing interpolation 
between pairs of faces, they do not give an estimate of the dimensions. Thus, the 
acceptability as a face of an object cannot be measured; this reduces recognition [2]/ 
In addition, redundancies between feature-point location and grey-level values cannot 
be described. Both these problems can be addressed by Principal Components 
Analysis. This extracts a set of orthogonal eigenvectors O from the covariance 
matrix of the images (either the pixel grey-levels, or the featurepoint locations). 
Combined with the eigenvalues, this provides an estimate of the dimensions and range 
of the face-space. The weights w of a face q can then be found,. 

w= O r (q-q) (1)'' 

and this gives the Mahalanobis distance 

(»u-^,) 2 (2) 

between faces ql and q2, coding in terms of expected variation [6]. Redundancies 
between shape and grey-levels are removed by performing separate PCAs upon the" 
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shape and grey-levels, before the weights of the ensemble are 'combined to form 
single vectors on which second PC A is performed [3], 

This 'appearance model* allows the description of the face in terms of true 
variation - the distortions needed to move from one to another. The following, studies * 
are performed embedded within this representation. How.ever, it will code the entire 
space as specified by our set of images, as can be seen in Figure 1. Thus, for example., 
the distance between the representations of two images will be a .combination of the 
identity, facial expression, angle and lighting conditions, These must be separated to 
allow detailed analysis of the face image. 

Figure 1: The first two dimensions of the face-space as defined by the appearance 
model. From the left, -2s:d:> the mean +2s;d:. The eigenfaces vary on Identity, 
expression, pose and lighting. 

3 Available Data . 

Although estimates of the sub-spaces might be gained from .external 
codes of every face on each type of variation, these are typically not available. 
Rather, different sets, each showing major variation on one sub-space alone were 
used. The sets comprised ; 

1 . A lighting set, consisting of 5 images of a single, male individual, all photographed 
fronto-parallel and with a fixed, neutral expression. The sitter was lit by a single lamp, 
moved around his face. 

2. A pose set, comprising 100 images of 10 different sitters, 10 images per sitter. 
The sitters had pointed their heads in a variety of two-dimensional .directions, of 
relatively consistent angle. Expression and lighting changes were minimal. 

3. An expression set, with 397 images of 19 different sitters, each making seven basic 
expressions: happy, sad, afraid, angry, surprised, neutral and disgusted." These images 
showed notable person-specific lighting variation, and some pose variation. 

4. An identity set, with 188 different images, one per sitter. These were all frontbr* 
parallel, in flat lighting and with neutral expressions. However, as is inevitable with 
any large group of individuals, there was considerable variation, in the apparent 
expression adopted as neutral. 
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4 Appearance Model Construction 

All the images had a uniform set of 122 landmarks found manually. An 
example of an ensemble image with landmarks is shown in Figure 2. A triarigulation 
was applied to the points, bilinear interpolation used to warp the images to a standard 
shape and size which would yield a fixed number of pixels. 

Figure 2: An example of an ensemble image (from the expression set)", 
showing the correspondence points. 

For testing purposes, the feature points were found using a multi-resolution 
Active Appearance Model [9], constructed using the ensemble images, but without 
grey-level normalisation. 

Since the images were gathered with a variety of cameras, it was necessary to : . 
normalise the lighting levels. For a given pixel, a grey-level of, say, 128=256 has a. 
different meaning in' one shape-normalised image from another. The shape-free grey 
level patch g was sampled from the shape-normalised image. To minimise the effect 
of global lighting variation, this patch was normalised at each point j to give * 

-A,)'*-; . (3) 
where A £ j^°'j are the mean and standard deviation. 

These operations allowed the construction of an appearance model [3] coding 
,99.5% of the variation in the 690 images, each with 19826 pixels in the face area. 
This required a total of 636 eigenvectors. 
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5 Sub-space Extraction 

5.1 Linear Methods 

It was initially hoped that the different sub-spaces might be linearly "separable. 
In that case, it would be possible to obtain the subspaces -by successively projecting 
the faces though the spaces defined by the other categories of faces and the taking the 
coding error as the data for the subsequent PCA. However, tests showed that the was 
not practical. The fourth and final set of components consistently - coded little but 
noise. A procedure where each sub-space removed only facial codes within it's' 'own 
span- (typically ±2SJD.) did produce a usable fourth set, but. the application was 
essentially arbitrary, and only used a small sub-set to calculate each sub-space. ' 

5.2 Non-linear Recoding 

This strongly suggested that it might be possible to extract the relevant data in 
a more principled manner, using the relevant variation present in each image-set. The 
basic problem is that each of the sub-spaces specified by the ensembles will code both 
the desired, 'official* variance, and an unknown mixture of the other types. This 
contamination will stem mostly from a lack of control of the relevant facial factors, so 
for example, the 'neutral* expressions seen in the identity set actually contain a range 
of different, low-intensity expressions. Examples of the starting identity, eigenfaces 
are shown in Figure 3, showing the limited identity span of this ensemble. 

In addition, there is no guarantee that the desired, ^pure' principal components 
for sub-space will be orthogonal with the others.. This follows from the ultimate 
linking factors, notably the three-dimensional face shape and the size and location of 
facial musculature, Significant improvements in tracking and recognition are possible 
by learning the path through face-space taken by sequence of face-images [10, 8]. 
This suggests both that these relationships may be ' susceptible to second order 
modelling, and that the estimates of the modes of variation given by the ensembles 
will be biased by the selection of images. 

These considerations suggest a scheme which allows the removal of the 
contaminating variance from the non-orthogonal estimates of sub-spaces, and also the 
use of the largest possible number of images. One possible method is offered by the 
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differences in variance on the principal components extracted from the various 
ensembles. 

Figure 3: The first two dimensions of the identity face-space. From the left, r 2s;d: > 
the mean H-2s;d:. The eigenfaces vary mostly on identity and lighting. 

Assuming that the ensembles predominately code the types of variance they 
are intended to, the eigenvalues for the .'signal' components should be larger than 
those of the 'noise 1 . The 'signal' components should also be somewhat more 
orthogonal to one another, and should certainly be less affected by minor changes in 
the ensembles which create them. 

The implication is that derived components may be improved by coding 
images on the over-exhaustive multiple sab-spaces in proportion to. their variance, 
then approximating the images on the separate sub-spaces and recalculating the- 
multiple spaces. Given a number of iterations, this should lead to a set of stable, and 
rather more orthogonal sub-spaces which code only the desired features. . 

Assuming that only a pair of subspaces, I and E are' involved, for a given 
vector q; the projection through the combined subspaces described' by. eigenvectors ij 
and ej (with associated eigenvalues A s andA t ) is given by 



g' = ± + £ w^e k (4) 



>=1 Jt«I 



with the constraint that 



be minimised. In practice, if M is the matrix formed by concatenating ! and.E, and D 
is the diagonal matrix of A. and A e , 

W = : — (6) 

DM M + / 
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and this also gives a projected version of the face 

, (DM T M + I)w - ... 

q - — = — — +9 (7) 

DM 

with u/, = 0 for those sub-spaces not required. 
5.3 Implementation 

• The first stage was to subtract the overall mean from each face, so ensuring 
that the mean of each sub-space was as close to zero as possible. Separate PCAs were 
then performed upon the image sets, discarding any further difference between the 
group and overall means. While the covariance matrices for the identity and. lighting 
sub-spaces were calculated as 

the pose and expression used 

where n 0 is the number of observations per individual, and" n p is the number of 

individuals, and <? ; the mean of individual i. Although all the eigenvectors, implied by. 

the identity, lighting and expression sets were used, only the two most variable from 
the pose set were extracted. 

The eigenvectors were combined to form Mi and the projected version for 
each face on each sub-space found using equations 6 and 7. This yielded* four 
ensembles, each with 690 images. This procedure would result in the loss of some 
useful variation. For example, the identity component of the expression and pose 
images was unlikely to be coded precisely by the identity set alone. Thus the 
difference between each original image and the version projected through the whole 
set of eigenfaces was found, and this "error image 1 divided. between the four projected 
versions in proportion to the ratios of the sums of the eigenvalues of the four 
subspaces. 
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A further four PCAs were performed on the recoded images, (all using 
Equation 8) extracting- the same number of components as on the previous PCA for 
the lighting, pose and expression sub-spaees, and all the non-zero components for the 
identity sub-space. These formed M 2 , and the original faces re-projected on this 
second-level estimate of the sub-spaces give a third-level estimate and so forth. The 
final result with regard to the identity images are shown in Figure 4. In comparison 
with those in Figure 1 the facial dimensions appear to have the same identities, but are 
normalised for expression, pose and lighting. * 

Figure 4: The first two dimensions of the identity face-space. From the left, -2s;d:, 
the mean -i-Zs.-rf:. The eigenfaces vary only on. identity, the range of which has been 
increased, 

Since the identity space was allowed to vary the number of .eigenfaces, while 
the others were fixed, inevitably any noise present in* the system will tend to 
accumulate in the identity space, and will reduce recognition performance if a 
Mahalanobis measure is taken. Thus once the system had stabilized, a final PCA on ■ 

C b & -*Xff-tf.-) r (10) 

was applied to the identity projections of the complete set of images, coding 97% of 
the variance. This allowed a final rotation to maximize between-person variance, 
reducing the identity eigenvectors from 497 to 153. These rotated eigenfaces were 
used only for recognition. 

6 Results 

Convergence of the .system was estimated by taking the Mahalanobis* 
distances between all the images on each of the sub-spaces. A Pearson product- 
moment correlation was taken between the distances of successive iterations, ' and 
allowed to converge to machine accuracy, although in practice a slightly lower value 
would achieve the same results with reduced processing time. The system gave a 
relatively smooth set of correlation coefficients as shown in- Figure 5, converging in 
approximately seven iterations. Since only 99.99% of the variance in the ensemble to 
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avoid problems with numerical accuracy, practical convergence was achieved by 
iteration 4. 

Figure 5: Changes in the correlations between the Mahalanobis distances separating 
all the images on the multiple space between iteration n and n - 1 . 

6.1 Coding errors 

Since the iterations involve the inclusion of information, which failed to be 
coded on the previous iteration, it should be expected that the difference between 
original and projected images should decline. This should apply to both ensemble and 
non-ensemble images as the eigenfaces become more representative. 

This was tested by projecting the images through the combined spaces (using 
Equations 6 and 7) and measuring the magnitude of the errors. This was performed 
for both the ensemble images and also for a large test set (referred to as* 'Manchester 1 ), 
first used in [11 ]. This consisted 600 images of 30 individuals, divided in half: a 
gallery of 10 images per person and a set of 1 0 probes per person. As can be seen in 
Figure 6, in both cases, the errors quickly drop to a negligible level. As a comparison, 
the two sets have mean magnitudes (total variance) of 1 1345 and 1 1807, measured on 
the appearance-model eigenweights. 

Figure 6:* Mean coding errors for the ensemble and tefet images, across iterations. 
Errors quickly decline to a negligible level in both cases. Errors on the individual sub- 
spaces remain high (4,000 to 1 1,000). 

6.2 Sub-space specificity 

The level of normalisation was measured on the Manchester set, calculating 
the identity weights using Equation 6, and finding the person-mean wt Better 
removal of contaminating variance should reduce the variance .for each individual, 
relative to this mean. The variance, 

n c n p N feJfc j=l 

was calculated. The results of this test in Figure 7 show a steady decline in the 
identity sub-space variance. The only exception to this is the value for iteration two; 
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this is unusual in having a large increase in the number of dimensions, without -an 
opportunity to re-distribute this variation into the other sub-spaces. 

. The results of projecting the faces into the. other sub-spaces- are shown, as is 
the variance in the appearance model. As might be expected, these are all higher than 
the identity sub-space value, and do not show marked .declines as the iterations 
progress. Indeed, the pose variance increases, slightly. 

Figure 7:' Mean within-person variances for the different sub-spaces as a function of 
iteration number. 

6.3 Recognition 

Recognition was also tested on the Manchester set, coding the images on the 
final rotated space. The Appearance Model used to provide correspondences, did not 
give completely accurate positions, lowering recogntiou. .The. pooled covariance 
matrix was found using Equation 9 on the W/. This allowed 

d \~k = & l ~ w k) T ^&i~\v k \ (]2) 

where 1 <, k £ (n 0 xn p ) to give Mahalanobis distances to the mean images. A 

recognition was scored when the smallest d had the same identity for i and k: The 
results are shown in Figure 8, and demonstrate that relative to the base condition, 
recognition improves by about one percent on iteration 4. Also shown are the effects ' 
of projecting the test images through the complete space to obtain the lighting - pose - 
expression normalised version, and then coded on the final rotated space. This does 
not produce an improvement in recognition. It should be noted here that there may 
well be contingent, non-functional correlations between parameters on different sub : 
spaces for individuals (for example, a consistent tendency to look up or down), whose 
omission may trade off against theoretically preferable eigenfaces. 
Figure 8: Recognition rates for Euclidean average-image matching. 

7 Conclusions 

Once an accurate coding system for faces has been achieved,' the major 
problem is to ensure that only a useful sub-set of the codes are used for any -given . 
manipulation or measurement. .This is a notably difficult task, as there are multiple, 
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non-orthogonal explanations of any .given facial configuration. In addition, it is 
typically the case that only a relatively small portion of the very large . data-base 
required will be present m the full range of conditions and with the labels needed a 
simple linear extraction. 

We have shown that both of these problems can be overcome by using an 
iterative receding scheme, which takes into account both the variance of and 
covariance between the sub-spaces which can be extracted to span sets of faces which 
vary in different ways. This yields 'cleaner 1 eigenfaces, with lower within appropriate 
group variance and higher inappropriate group variance." Both these facts reflect 
greater orthogonality between the sub-spaces. In addition, recognition on an entirely 
disjoint test set was improved, although marginally. There are a number of major 
possible applications in tracking, lip-reading and transfer of identity from one person 
to another. 
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CLAIMS 

1 . A method of determining face sub-spaces, the. method comprising making 
initial, estimates of the sub-spaces, for example lighting, pose, identity and expression, 
using Principle Component Analysis on appropriate groups of faces, applying an 
iterative algorithm to image codings to maximise the probability of coding across 
these non-orthogonal sub-spaces, obtaining the projection on each sub-space, and 
recalculating the spaces. 

2. A method of determining face sub-spaces, the method comprising: 

a, generating a first series of initial images in which a first predetermined 
facial property is modified, 

b. generating a second series of initial' images -in which - a second 
predetermined facial property is modified, 

* c. coding each series of images according to the variance of the images to 
obtain an estimated sub-space for each facial property, 

d. concatenating the sub-spaces to provide a single over-exhaustive 
space, 

d. approximating each image of the' first and second series on the 
estimated property sub-spaces to obtain approximated values of each image on 
each estimated property subspace, 

e. generating a reformed version of each image . using the approximated 
values, 

f. comparing the reformed version of each image with- the initial image to 
determine an error value for each image, 

g. sub-dividing the error value for each image into a sub-error for each 
estimated property sub-space in proportion to the variance of that sub-space, 

h. combining each -sub-error for each image with the approximated value 
of that image on the estimated property sub-space, to obtain a new 
approximated value in the property sub-space for each image, 
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i. coding the new approximated values of the images according to. their 
variance to obtain new estimated sub-spaces. 

3. A method of determining face sub-spaces according to claim 2, further 
comprising approximating each image on the new estimated sub-spaces as described 
in steps 'a' to < h' and then repeating steps *e' to *j' until the sub-spaces have 
stabilised. 

4. A- method of determining face sub-rspaces according to claim 2 or claim 3, 
wherein four series of images are generated, a different predetermined .facial property 
being-modified in each series. 

5. A method according to claim 4, wherein the predetermined facial properties 
are categorised as identity, expression, pose and lighting. 

6. A method according to. any of claims 2 to 5, wherein at I6ast one further series 
of images is generated, a further predetermined facial property being modified in the* 
series. 
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ABSTRACT 

A method of detemiining face sub-spaces. The method comprises making 
initial estimates of the sub-spaces, for example lighting, pose, identity and expression, 
using Principle Component Analysis on appropriate groups of faces. The method 
further comprises applying ah iterative algorithm to image codings to maximise the 
probability of coding across these non-orthogorial sub-spaces, obtaining the projection 
on each sub-spate*, and recalculating the spaces. 
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